Graphorall

Back

DB course

  • Algebra

    • preliminaries: relation, attribute, schema, tuple, component, domain, current instance, key

    • operations

        • set ops: union, intersect, minus
        • filter ops: select
        • relation ops: product, joins
        • mapping ops: rename, project
      • equivalence

        • RS=πL(σC(R×S))R\Join S=\pi_L(\sigma_C(R\times S))

    • extended operators

      • deduplicate δ\delta

      • aggregation

      • grouping γ\gamma

      • outer join (left,right, and theta)

      • sorting τ\tau

    • datalog

  • Formalization

    • FD

      • key -> tuple and key is minimal, superkey is superset of key

      • rules

      • closure

        • BAs+    AsbB\in As^+ \iff As\to b

        • {superkey}+={all attributes}

        • check whether a FD follows from a given set

      • minimal basis

        • 1.All the FD’s in B have singleton right sides.
        • 2.If any FD is removed from B, B is no longer a basis.
        • 3.If for any FD in B we remove one or more attributes from the left side of F, the result is no longer a basis (why?)
      • projecting

      • minimize

        • remove those follow from others

        • remove left side redundancy

    • BCNF

      • any FD As->Bs, As is superkey

      • decompose: not dependency preserving

      • chase test for lossless join

    • 3NF

      • dependency preserving

      • any FD As->Bs, As is superkey or Bs are each in a key

      • decompose: find min-basis, a relation for each FD, add a key relation

    • MD

      • intuitively, As->->Bs holds if As determine a set of values of Bs.

      • trivial: As ->-> subset of As, As->->\As(others=\emptyset)

      • FD promotion: As->Bs => As->->Bs(in tabular test, b1=b2b_1=b_2)

      • transitive

      • complementation: As->->Bs => As->->others

      • 4NF

        • any MVD As->->Bs, As is superkey

      • chase test: initially two tuples t and u, use FDs and MVDs to produce v

      • projecting

        • simplification(not reviewed)

    • E-R Model

      • many->one(less than), many->one(exactly, rounded arrowhead), many—(degree constrains)many, roles

      • multiway relations -> relation entity&multi binary relations

      • isa(subclass relationship)

        • differs from object-oriented approach, entity may be in multi sub entity sets

        • root entity set

      • remove redundant entity set: referred once in relation, all attrs are in key

      • weak entity set (and its supporting relationship)

        • def: key is composed of attributes , some or all of which belong to another entity set

      • convert to relational design

        • combining many-one relationship to many side

        • weak entity set: don’t need supporting relation

        • classes: null method, OO method(enumerate all subtrees)

    • Hard Problems

      • 设有关系 R(A,B,C,D,E,P),F={AB→C,C→D,E→A,B→C,CA→E,BD→A} 1、写出关系 R 的所有候选键,并指出关系 R 为第几范式。要求说明原因。 (4 分) 2、求出该函数依赖集的最小集 Fm。写出求解过程。 (7 分) 3、将 R 分解为具有无损连接性和依赖保持性的 3NF。写出分解过程。 (5 分)

        • Fm={B->C,C->D,E->A,AC->E,B->A}

  • SQL lang

    • components

      • DDL: data definition lang, schema and constraint etc.
      • DML: data manipulation lang, query and transaction etc.
      • DCL: data control lang, permission and grant etc.
    • relations: stored, view, temp tables

    • schema

      • CREATE TABLE/VIEW <name> (...)

        • inline constrains after attribute

        • or table-wise constrain after all attrs

      • DROP TABLE/VIEW <name> (...)

      • ALTER TABLE/VIEW <name> (...)

    • query

      • SELECT <>
        FROM <table>[alias],...
        WHERE{
        CONDITIONS
        }
        GROUP BY ()
        HAVING <condition>
        sql
      • operators

        • not equal <>

        • pattern matching: _, %, '', 'pattern' ESCAPE '<escape char>'

        • NULL: calc result is NULL, comparison result is UNKNOWN

        • EXISTS, [NOT] IN, > ALL, > ANY(existence)

        • CROSS JOIN, a JOIN b ON <c>, FULL|LEFT|RIGHT OUTERJOIN

        • set operators normally eliminates duplicate, use a UNION ALL b to avoid

    • modification

  • SQL other features

    • Constraint

      • kinds

        • default value

        • key

          • primary key not nullable, unique nullable
          • can’t agree on all attrs, unless one is null
          • two nulls don’t agree on each other
        • foreign key

          • policy: CASCADE, SET NULL, DEFAULT(reject)

        • not null

        • trivial check

        • assertion

          • CREATE ASSERTION xxx CHECK (...)

      • naming

      • defer

        • [NOT] DEFERRABLE [INITIALLY DEFERRED/IMMEDIATE]

        • SET CONSTRAINT xxx DEFERRED

      • scope

        • attribute: checked only when target attribute changes

        • tuple: checked more frequently

        • table(assertions)

    • Trigger

      • CREATE TRIGGER XXX
        AFTER|BEFORE|INSTEAD-OF UPDATE|DELETE|INSERT ON <table>
        REFERENCING xxx
        FOR EACH ROW|STATEMENT
        [WHEN <condition>]
        [BEGIN]
        statements;
        [END]
        plaintext
    • View

      • updatable view

      • using instead of trigger

      • materialize view

    • Index

      • estimate query cost (simple)

        • tuple locate

          • by index, 1r per index access

          • no index, full table scan

        • tuple access

          • if clustered, then may be 1r

          • else num of pages tuple is distributed

        • tuple insert

          • usually zero-cost to find a slot

          • 1r per index update

  • Storage

    • disc access model

      • find track -> move to track -> wait for spinning -> start transferring

        • move to track: head start/stop + head move(n-tracks/velocity)

        • TODO spinning: half?avg? :LOGBOOK: CLOCK: [2022-11-02 Wed 08:31:22]—[2023-04-16 Sun 16:27:16] => 3967:55:54 :END:

        • transferring: size-block(sectors+gaps)/size-track * time per rotation

    • physical optimize

      • organize data by cylinders

      • mirroring disks

        • enhance reliability

        • speed up reading(different part per disk) not writing

      • scheduling

        • elevator algorithm

      • intelligent buffering

    • robustness

      • checksum

      • redundant storage(stable storage, RAID)

        • TODO RAID 4(one main disk, bottleneck) 5(symmetric backup) 6(multiple ) :LOGBOOK: CLOCK: [2022-11-02 Wed 09:25:01]—[2023-04-16 Sun 16:27:21] => 3967:02:20 :END:

    • storage format

      • schema

        • number of fields, type, order, name

      • fixed-length format

        • header

          • pointer to schema, length, timestamp

      • variable-length format

        • a pointer to other than first variable field

        • header->pointers->fixed-fields->var-fields

        • dedup in-record by another level pointers

        • separate variable fields

        • spanned records

      • pack into blocks

        • header

          • one or more other blocks

          • role of this block(normal data or overflow)

          • which relation the records belongs to

          • record offsets

            • real offset in-block

            • or forwarding address to another block

            • or tombstone indicating it’s deleted

          • timestamp

        • two-ends growing to center

          • variable header and record length

    • addressing

      • physical disk address

      • logical address

        • more compact

        • tradeoff: flexibility <-> cost

          • e.g. move records of indirection

        • logical->phisical

        • logical->memory

      • pointer swizzling

        • avoid repeatedly translating addr

        • strategies

          • automatic

          • on demand

        • unswizzling when returning to disk

          • need mem->logical indexing

      • pinned records and blocks

        • pinned: object is referred to by a swizzled pointer

        • use spare bytes(logical-memaddr) to make a linked list

    • modification

      • insertion

      • deletion

        • immediate reclaim space

        • mark delete

      • update

        • fixed-length easy

        • dynamic via deletion&insertion

  • Index

    • types

      • dense: one by one key-pointer pairs

      • sparse: adjacent entries determine a range corresponding to keys

      • multi-leveled

      • secondary

        • must be dense

      • indirection

        • reduce replicated key entries

        • key->bucket of many pointers->tuple

      • revert index(document retrieval)

    • data structures

      • B-trees

        • rules

          • node min item
          • leave node
          • multiset case: min new key, blank key
        • insertion and deletion

          • recursively split
          • borrow from sibling node
      • hash table

        • extensible hash table

          • bucket directory

            • initially first 1 bit
            • once bucket used up, split the bucket, use k+1 bits, other buckets still remain unchanged
            • once directory length not enough, double the directory, b0/b1 points to same bucket
        • linear hash

    • multi dimensional

      • partial match, range, nearest-neighbor

      • hash like approach

        • grid file

          • dynamic insertion: overflow block, or insert grid line

          • hard to find optimal split

        • partitioned hashing

          • good at partial match

      • tree like approach

        • kd-tree

          • long search path
        • R-tree

      • bitmap approach

        • field -> collection of bitmaps, one bitmap per value

        • compressed bitmap

          • run-length encoding
    • exercise

      • 3.1 线性表删除时如何处理位数i减少

      • 3.2 hard h(k)的每一位都相同无法区分,若重复数多于bucksize如何split

      • 7.5 如何编码趋近log2i\log_2i

  • Exec

    • Physical Ops

      • one pass

      • two pass

      • sort

      • index

      • hash

    • algebra simplification

    • cost model

    • logical & physical plan

    • exercise

      • 4.3.2

      • 4.3.5

      • 4.4.5、4.4.8 子表少于M如何利用空闲缓冲区

      • 4.5.5 考虑磁盘IO特性,计算开销

  • Schedule

    • serializability

    • conflicts: ignore actions from same trans conflict equiv: by transforming non-conflict ops conflict serializable: conflict equiv to some serial schedule view serializable: ignore blind write

    • lock

      • shared-exclusive

      • upgrading lock

        • shared->upgrade: may lead to deadlock

          • update lock: may later write, singleton but can coexist with shared

      • increment lock

      • procedure

        • naive lock -> 2PL: conflict serializable, unlock after all locks

        • lock table

          • group mode

          • lock list: waiting and holding

          • grant policy

      • granularity

        • intention lock

          • 组模式,某种锁优于另一种

        • tree protocol

          • 观察到无需对父节点修改时unlock

    • deadlock

    • timestamp&validation

      • read/write too late

      • uncommited TX may abort -> delay until determined

    • exercise

      • image.png

        • r1a,w1a,r1b,w1b,r2b,w2b,r2a,w2a r2b,w2b,r2a,w2a,r1a,w1a,r1b,w1b 注意到其对A、B的操作有交换律,(42)\binom{4}{2}

        • Compare with 18.2.3

      • 2.4 wn(A)...ri(A)wi(A)ri+1(A)wi+1(A)...rn(A)w_n(A)...r_i(A)w_i(A)r_{i+1}(A)w_{i+1}(A)...r_n(A)

      • 2.6 r3c,r1a,w1c,r2a,w2b,r3b

      • 3.1? image.png

        • case serial: 2 case interleave: 1A,2B->1B; 2B,1A->1A; 因此只有1A2B/1B2A两组内部可交替,共(42)2=36\binom{4}{2}^2=36

      • 4.1 会发生什么怎么说?

      • 4.6

      • 4.7 所有的?更新?如何考虑锁? a. r2c; b. i2c,i1b; c. u1b; d. w1b;

  • Transaction

    • failure recovery

      • undo

        • write log->write disk->commit, commit data
      • redo

        • write log->commit->write disk, commit log
        • checkpoint: commit data
      • undo/redo *

      • checkpoint

        • 避免向前追溯
        • start ckpt(Ti); end ckpt,若有end则Ti无需恢复
        • start ckpt(Ti)中的Ti似乎可以省略?
      • logical log

        • logical equiv, compensating action

    • recoverability

      • (读)依赖的值必须先commit

      • 日志落盘顺序需与产生顺序一致(group commit)

      • cascading rollback: dirty data, ACR调度, strict locking

      • rollback policy

        • block: cache refresh

        • smaller: using log

    • exercise

      • 1.1 2PL和严格封锁的区别? strict只有在事务结束时才能统一释放所有w锁

      • 1.4 “恢复时产生脏读”?

      • 1.5 又是数调度!

      • 2.6

      • 6.2.2 如何计数 归纳递推,转换为向L1,[subproblem(n1)]L1,[subproblem(n-1)]中插入O1

  • Course Project

    • 杨元钊、钟祎诚、邹奕杨

    • 架构: 整体为C/S架构,Server端采用MySQL数据库,Client使用Python+QtQuick开发。其中Server部分使用一台华为云ECS,根据MySQL官网指示部署了MySQL 8.0(最初使用docker部署,但性能低于常规值3个数量级,后期推测是由于ECS的虚拟化云硬盘IO延迟高)。Client部分为Qt前端+Python后端,主程序由Python调用PySide(Qt)运行QML引擎,引擎运行QML代码实现界面显示与交互,QML代码通过引擎调用后端类获取数据;后端类使用pymysql执行编写好的SQL语句与数据库交互。

    • 分工: 杨元钊负责数据的爬取清洗及数据库Schema的设计、创建; 邹奕杨负责客户端后端类的编写、SQL语句的编写及部分语句优化; 钟祎诚负责服务端的部署、客户端整体和UI部分的编写、SQL优化。

    • DBProject.drawio_1675502686622_0.png

Linked References (5)

pg exec #DB course

  • ExecFinish 处理向client返回结果之外的副作用

    • 计入执行时间

    • 触发器、未完毕的写操作(如只需返回3条,但需更改所有)

  • 事务虚实ID

    • 进程、进程内计数
    • 全局唯一性,进程内、进程、不同次运行中进程号可能相同
  • CLog与XLog,事务管理信息与数据信息分别存储

simple strategy to identify queries worthy of optimization: don’t optimize queries below a threshold #DB course

DONE two weeks’ homework #homework #DB course DEADLINE: <2022-11-02 Wed> :LOGBOOK: CLOCK: [2022-10-27 Thu 23:05:37]—[2022-11-02 Wed 23:26:08] => 144:20:31 :END:

DONE review 1-7 chapters #DB course DEADLINE: <2022-11-01 Tue> :LOGBOOK: CLOCK: [2022-10-27 Thu 23:07:00]—[2023-04-16 Sun 16:26:52] => 4097:19:52 :END:

DB course
https://blog.graphorall.top/blog/DB%20course
Author rubbishzyc
Published at April 16, 2023